Pahle et al. BMC Systems Biology 2012, 6:86 
http://www.biomedcentral.eom/1752-0509/6/86 

Systems Biology 



RESEARCH ARTICLE Open Access 



Biochemical fluctuations, optimisation and the 
linear noise approximation 

Jurgen Pahle^*, Joseph D Challenger^, Pedro Mendes^'^ and Alan J McKane^ 



Abstract 

Background: Stochastic fluctuations in molecular numbers have been in many cases shown to be crucial for the 
understanding of biochemical systems. However, the systematic study of these fluctuations is severely hindered by 
the high computational demand of stochastic simulation algorithms. This is particularly problematic when, as is often 
the case, some or many model parameters are not well known. Here, we propose a solution to this problem, namely 
a combination of the linear noise approximation with optimisation methods. The linear noise approximation is used 
to efficiently estimate the covariances of particle numbers in the system. Combining it with optimisation methods in 
a closed-loop to find extrema of covariances within a possibly high-dimensional parameter space allows us to answer 
various questions. Examples are, what is the lowest amplitude of stochastic fluctuations possible within given 
parameter ranges? Or, which specific changes of parameter values lead to the increase of the correlation between 
certain chemical species? Unlike stochastic simulation methods, this has no requirement for small numbers of 
molecules and thus can be applied to cases where stochastic simulation is prohibitive. 

Results: We implemented our strategy in the software COPASI and show its applicability on two different models of 
mitogen-activated kinases (MARK) signalling - one generic model of extracellular signal-regulated kinases (ERK) and one 
model of signalling via p38 MARK. Using our method we were able to quickly find local maxima of covariances 
between particle numbers in the ERK model depending on the activities of phospho-MKKK and its corresponding 
phosphatase. With the p38 MARK model our method was able to efficiently find conditions under which the coefficient 
of variation of the output of the signalling system, namely the particle number of Hsp27, could be minimised. We also 
investigated correlations between the two parallel signalling branches (MKK3 and MKK6) in this model. 

Conclusions: Our strategy is a practical method for the efficient investigation of fluctuations in biochemical 
models even when some or many of the model parameters have not yet been fully characterised. 

Keywords: Linear noise approximation. Optimisation, Mitogen-activated kinases signalling, CORASI, Intrinsic noise, 
Stochastic biochemical models, Systems biology 




Background 

Random fluctuations in discrete molecular numbers can 
have significant impact, both detrimental and construc- 
tive, on the functioning of biochemical systems [1,2]. 
Systems that contain only relatively small numbers of 
particles of a certain chemical species, such as in signal 
transduction or gene expression, are particularly prone 
to this intrinsic noise. Here, the underlying discreteness 
of the system and stochastic timing of reactive events 
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can lead to fluctuations in species abundances of high 
relative amplitude. Even when particle numbers are 
high, stochastic effects can significantly affect the 
dynamic behaviour of certain biochemical networks [3]. 

Biochemical systems have evolved to be robust against 
molecular fluctuations by attenuation, or even to exploit 
them (see [4,5] for examples). Therefore, these fluctuations 
should be considered whenever quantitative and dynamic 
models are devised to describe biochemical systems. 
Different mathematical formalisms have been developed 
to allow stochastic modelling and to explicitly take into 
account random fluctuations. Such systems are usually 
modelled by a continuous-time Markov process which fol- 
lows the chemical master equation. The chemical master 
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equation describes the time evolution of the system state 
probability distribution, i.e. how probable it is that a 
chemical species in the system will have specific particle 
numbers at a specific point in time. Both analytic and 
numerical solutions of this chemical master equation are 
difficult to obtain for most biologically relevant systems. 
Even though there exist methods to numerically solve the 
master equation [6] these are only feasible for relatively 
simple systems. A popular substitute is to apply Gillespie's 
stochastic simulation algorithm [7] to calculate single tra- 
jectories of the system's dynamics. By calculating very 
many of such (random) instances one can then approxi- 
mate the trajectory of the probability density function of 
each chemical species and calculate relevant time-depen- 
dent statistics, such as the mean value or covariances. 
However, the stochastic simulation of single trajectories 
alone can be computationally demanding. The calculation 
of very many of them quickly becomes impracticable even 
when accelerated approximate stochastic simulation meth- 
ods [8] are employed. 

For a quick characterisation of the fluctuations in a bio- 
chemical system there exists an alternative, namely the 
linear noise approximation (LNA; see, e.g., [9-11]). This 
approximate method is based on van Kampen's system- 
size expansion of the chemical master equation [12-14]. 
The LNA estimates the variances of the species abun- 
dances and the covariances between them. Even though, 
theoretically, the LNA is only locally valid in the vicinity 
of macroscopic steady states or other system trajectories, 
in practical terms, it often gives good results even when 
the behaviour of the stochastic model and the behaviour 
of the corresponding deterministic model are quite dif- 
ferent [15]. The LNA is particularly interesting because it 
is independent of computationally demanding stochastic 
simulations but, instead, only uses information about the 
stoichiometries in the system and the macroscopic reac- 
tion rates - therefore it can be calculated very quickly. 
Other approaches have also been proposed for the esti- 
mation of steady state noise. For instance, in [16], analyti- 
cal estimates of the fluctuations are found using error 
growth techniques. These are based on ideas from non- 
linear dynamics and do not begin from a master equa- 
tion. This is in contrast to the work presented here, 
where the molecular basis of the model is central, and 
where the nature of the fluctuations can be explicitly cal- 
culated. There are, of course, many studies of fluctua- 
tions in biochemical systems. For instance, in [17] the 
authors use data from time series to infer the values of 
the model parameters. This is in some sense the converse 
of our approach. 

Often, in practice, one or more of the parameters of a 
model, such as reaction rates or initial concentrations, 
cannot be exactly determined. For instance, such para- 
meters might only be known to lie within a certain range 



or nothing might be known about them at all. This 
uncertainty about parameters can translate into uncer- 
tainty about the system behaviour when it has high sensi- 
tivity towards those parameters. This is also true for 
molecular fluctuations in the system since their expected 
amplitude and other properties depend on parameter 
values. If only one or two parameters are unknown it is 
possible to exhaustively scan this parameter space using a 
regular grid or other techniques to probe how the model 
is affected by variations in values of those parameters. 
However, this approach is not feasible if the number of 
unknown parameters is large since the hyper-volume of 
the parameter search space increases exponentially with 
the number of uncertain parameters, and consequently 
so does the computational time. 

In this article we introduce a different strategy to study 
random fluctuations in biochemical models with para- 
meters that are not well characterised. Our approach 
combines the LNA with optimisation methods to search 
the unknown parameter space for parameter values that 
lead to extrema in covariance estimates. This can drama- 
tically reduce the required computation time compared 
to exhaustive searches with stochastic simulations, 
thereby permitting types of studies of stochastic fluctua- 
tions that were not possible before. We will show a rele- 
vant biological example of a search for conditions that 
minimise the noise in the output of a p38 MAPK signal- 
ling system. Scanning the parameter space and using sto- 
chastic simulation is clearly impossible here because this 
would take more than 2.4 • 10^^ years. Our method, in 
contrast, was able to find these conditions in 25 min. 
Therefore, the strategy we are proposing makes it possi- 
ble to gain biological insight about the noise structure of 
relevant biological systems even if these systems are big 
and the parameters are not well defined. 

Global optimisation methods have been shown to be 
effective in finding good extrema estimates of dynamic 
properties of biochemical network models even in high- 
dimensional search spaces [18,19]. The strategy proposed 
here is similar to an earlier one successfully applied to 
the search for extreme values of sensitivities [20]. 

The application of this strategy passes through a closed 
loop containing the automatic calculation of a steady state, 
the LNA method and one optimisation algorithm; alterna- 
tively the method is also appropriate to use with para- 
meter scanning or sampling algorithms instead of the 
optimisation. We implemented this strategy in the soft- 
ware COPASI [21,22], which already contains optimisa- 
tion, scanning and sampling algorithms. We demonstrate 
the application of this new strategy on two different mod- 
els of mitogen-activated kinase (MAPK) signalling path- 
ways, namely a model of extracellular signal-regulated 
kinases (ERK) by Kholodenko [23] and a model of p38 
MAPK by Hendriks et al. [24]. 
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Results 

Implementation of the method in COPASI 

The software COPASI [21,22] gives all interested 
researchers easy access to modelling and simulation for 
biochemical networks, because it is freely available 
under the Artistic license version 2.0 at [22] and sup- 
ports the Systems Biology Markup Language (SBML) 
standard [25] for the exchange of model files with other 
software. An implementation of the method described 
here was integrated in COPASI, comprising a new LNA 
task that, using the linear noise approximation (see 
Methods), generates as output a matrix of covariance 
estimates between all the species' particle numbers in a 
given biochemical model (see Figure 1). Prior to this, 
the method can also automatically calculate a steady 
state for the model which is important if parameters, 
and thus the system's steady state, have been changed. 
The covariances estimated by the LNA task can then be 
subsequently used by other tasks in COPASI, in particu- 
lar optimisation, parameter scanning or sampling in a 
closed-loop fashion. This combination results in a prac- 
tical method for the investigation of fluctuations in 
models even when some or many of the model para- 
meters have not yet been fully characterised. 



Our implementation allows arbitrary objective func- 
tions to be optimised. For instance, LNA estimates of 
covariances of different chemical species, as well as 
other model observables, can be combined into a com- 
plex objective function. This allows the calculation of 
various quantities of interest, for instance, Fano factors 
[26] or coefficients of variation (CV), as shown below. 
In terms of parameter search, our implementation can 
use a large variety of numerical optimisation algorithms, 
both local and global, that are accessible in COPASI - 
gradient-based, particle swarm [27], simulated annealing, 
evolutionary algorithms and others [28]. This is particu- 
larly important since the performance of global optimi- 
sation algorithms has been shown to be problem- 
dependent, and no single one is guaranteed to converge 
to a global optimum for all problems [29]. 

Application of the method on MAPK signalling systems 

Signalling through mitogen-activated protein kinases 
(MAPK) is involved in a broad range of cellular pro- 
cesses, such as proliferation, differentiation, stress 
responses and apoptosis. Therefore it is also implicated 
in a variety of diseases like cancer, stroke or diabetes 
[30]. As such, it has been the object of a number of 
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Figure 1 Screen shot of the LNA implementation in COPASI. Screen shot of the COPASI graphical user interface and the linear noise 
approximation task. Shown is the resulting covariance matrix of species' particle numbers in the p38 MAPK model by Hendriks ef al. [24], The 
matrix is colour coded, positive values have a green background and negative values a red one with intensities corresponding to the absolute 
values. 
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computational modelling studies that helped elucidate 
dynamic properties of the system, such as amplification 
of signals, noise reduction or switching behaviour [31]. 

There exist different specific MAPK signalling path- 
ways with different functions, for example ERKl/2, p38 
or JNK, with different topologies and characteristics. 
However, in most cases the basic structure is that of a 
three-tier cascade. Here, the MAPKs on the output 
level, such as ERKl/2 or p38, phosphorylate transcrip- 
tion factors or other proteins to trigger specific cellular 
responses. The MAPKs are, in turn, activated via phos- 
phorylation by other protein kinases, so-called MAP2K 
(or MKK) that are themselves activated by MAP3K (or 
MKKK) further upstream. 

Fluctuations in a model of ultrasensitivity in ERK MAP 
kinase signalling 

We will now apply the LNA to a MAPK cascade model 
due to Kholodenko [23], which is a popular model of a 
generic extracellular signal-regulated kinases (ERK) 
MAPK signalling cascade. Due to a negative feedback 
loop, the model can exhibit limit cycle behaviour for 
some parameter values, and a stable steady-state for 
others. While Kholodenko examined the model in the 
limit cycle regime [23], we reduced the feedback 
strength by increasing the kinetic constant Kj to 45, so 
that a stable steady-state exists (all other parameter 
values remain as in the original paper). A typical sto- 
chastic simulation of the system is shown in Figure 2, 
simulated with Gillespie's Direct Method [7] (as imple- 
mented in COPASI). 

It is interesting to see how the magnitude of fluctua- 
tions changes with the reaction parameters. As an exam- 
ple, we used our LNA implementation in COPASI in 
combination with a parameter scan to investigate how 
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Figure 2 Stochastic simulation of the ERK iVlAPK model. MKKK 
and phospho-MKK particle numbers vs. time [s], with Kj = 45, = 
10'^^ I and all other parameters as in [23]. 



changes in the reaction parameter V2 affect the variance 
of MKKK (MAPK kinase kinase). Values of V2 were 
scanned within a certain range and the LNA automati- 
cally calculated for each value of V2- In the model, this 
parameter corresponds to the V^ax of phospho-MKKK 
dephosphorylation and so refers to the activity of 
MKKK-phosphatase. 

Presently protein kinases are much better characterised 
at the molecular level than protein phosphatases. As a 
consequence the effect of phosphatases are often also not 
studied in signalling models. However, here we are able 
to show that the activity of the MKKK-phosphatase does 
not only influence the type of dynamics the system exhi- 
bits, namely that the steady state becomes unstable at 
V2 = 0.446 due to a Hopf bifurcation. It also strongly 
affects the intrinsic fluctuations in the system. As can be 
seen in Figure 3, the estimated variance of MKKK 
becomes large as V2 approaches the bifurcation point 
and, interestingly, it shows a local maximum at V2 = 0.32 
of 987.7 particles^. The value of V2 in Figure 3 does not 
go as far as the bifurcation point, as the LNA loses accu- 
racy near this value. 

We then wanted to investigate the conditions under 
which fluctuations in chemical species at different posi- 
tions of the signalling cascade become correlated. To 
achieve this, we used the optimisation task in COPASI to 
maximise the covariance of the fluctuations of MKKK 
and MKK-P, allowing the reaction parameters V2 and /C4 
to vary over a given range of values. Using the evolution- 
ary programming algorithm [28] (which took 199 
seconds to run) 4004 steady state and LNA evaluations 
were carried out. A local maximum of the covariance was 
found with a value of 4035 particles for V2 = 0.3226 and 
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Figure 3 Parameter scan of IVIKKK particle number variance 
against reaction parameter V2 in the ERK MAPK model A 

parameter scan of the variance of the particle number of species 
MKKK has been carried out for a range of values of the reaction 
parameter V2, with K, = 45, = 10"" I and all other parameters as 
in [23]. 

^ J 
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= 0.0166. The algorithm converged to this value 
already after 880 iterations. A parameter scan over the 
same parameter space was also performed to better illus- 
trate the change in correlation with these two para- 
meters. Figure 4 shows how the covariance of the 
fluctuations of MKKK and MKK-P varies with the reac- 
tion parameters, and provides a visualisation of the land- 
scape that the optimisation algorithm must traverse. 
Note that the covariance becomes negative for some 
parameter values. 

Fluctuations in a model of p38 MAPK signalling 

The so-called p38 mitogen-activated protein kinases (p38 
MAPK) are responsive to proinflammatory cytokines and 
stress factors [32]. One prominent signal are lipopolysac- 
charides (LPS), which are components in the cell wall of 
bacteria. Their presence indicates a bacterial infection and 
triggers a strong immune response in animals. The MAPK 
of this pathway, p38, can, inter alia, activate MAP kinase- 
activated protein kinase 2 (MK2). One substrate of MK2 is 
the heat shock protein 27 (Hsp27) and the concentration 
of the active/phosphorylated form of Hsp27 is regularly 
used to estimate the activity of the p38 MAPK signalling 
pathway. The level of Hsp27 will also represent the main 
signalling output in the model. 

The model we use for this study was developed in 
Hendriks et al. 2008 [24]. Its structure is shown in the 
additional file 1. The original model included the rapid 
inactivation of a (TAK1:TAB1:TAB2) complex. This was 
represented by a degradation reaction which, after an 
initial peak, led to an abrogation of p38 MAPK activity. 
For this study we removed this degradation reaction 
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Figure 4 Two-dimensional parameter scan of MKKK and IVIKK-P 
particle numbers' covariance in the ERK MAP model. A two- 
dimensional parameter scan of the covariance of the particle 
numbers of species MKKK and IVIKK-P. The parameter V2 was varied 
between 0.22 and 0.41 and the parameter between 0.015 and 
0.035. 



which allows the system to reach a steady state of sus- 
tained p38 MAPK signalling depending on the amount 
of LPS. We also reformulated the model in such a way 
that it no longer contained three compartments (med- 
ium, cytosol and nucleus) but, instead, uses a single 
reference volume for all species, including the nuclear 
ones, corresponding to the volume of the whole cell. 
This was needed because the current implementation of 
the LNA can only handle models with one compart- 
ment. In [24] the model was fitted to experimental mea- 
surements, and in the following we will use the set of 
parameters which showed the best fit. 

As mentioned above, random fluctuations in signalling 
systems are particularly interesting to study, since here 
copy numbers of the different species are often low. For 
instance, MKK3 and MKK6 are typically present in the 
order of only ten thousand particles per cell. This could 
lead to pronounced fluctuations which hamper reliable 
information transfer through this signalling pathway. 
But perhaps there are conditions (parameter values) for 
which these fluctuations are minimised, which is what 
we want to investigate. 

First we looked at the estimated variances of different 
signalling intermediates, such as phospho-MKK3, phos- 
pho-MKK6, cytosolic phospho-p38 and nuclear phospho- 
p38 with varying stimulus strength, i.e. concentration of 
LPS (Figure 5, panel A). We performed a parameter scan 
in COPASI where the initial concentration of LPS was var- 
ied within a certain range and the LNA was automatically 
calculated at each LPS concentration. We found that the 
variances increase with increasing stimulus strength but 
saturate at high values of LPS (resembling hyperbolic 
functions). 

By contrast, phospho-Hsp27, the endpoint of the 
modelled signalling pathway, shows a decrease in its var- 
iance with increasing stimulation (Figure 5, panel B). 

However, looking at the coefficient of variation (CV) 
both nuclear phospho-p38 and cytosolic phospho-Hsp27 
show a decrease of variation with increasing stimulation 
due to increasing steady state particle numbers (Figure 6 
shows the CV of nuclear phospho-p38 against the con- 
centration of LPS). This means that, in both cases, the 
relative amplitude of fluctuations decreases with increas- 
ing signal strength - the higher the stimulus, the less 
ambiguous it becomes. 

An interesting property of the p38 MAPK pathway is the 
existence of two parallel signalling branches, through 
MKK3 and MKK6, that both can phosphorylate p38 
MAPK. Therefore, we were interested in whether fluctua- 
tions in the MKK3 branch correlate with fluctuations in 
the MKK6 branch. First, we scanned the estimated covar- 
iance of phospho-MKK3 and phospho-MKK6 over a 
range of stimulus strengths. We found that the fluctua- 
tions in the two branches seem to be mostly uncorrelated 
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[LPS](0) [ng/ml] [LPS](0) [ng/ml] 

Figure 5 Variances of species' particle numbers versus stimulation strength in the p38 MARK model. Panel A: Variance of cytosolic 
phosplio-MKK3 (x), pliosplio-IVlKKe (°), plnospho-p38 (o), and nuclear phosplio-p38 (A) particle numbers vs. strength of stimulation 
(concentration of LPS [ng/ml]). Panel B: Variance of cytosolic phospho-Hsp27 versus strength of stimulation (concentration of LPS [ng/ml]). 



(the LNA actually estimates a very weak anti-correlation 
for higher initial concentrations of LPS, data not shown), 
an indication that the largest part of the fluctuations does 
not originate from the common upstream part of the two 
branches but rather from within the branches themselves. 

We now wanted to investigate how the parameters in 
the system influence this anti-/correlation. Therefore, we 
searched for extreme values of the LNA-estimated corre- 
lation coefficient of phospho-MKK3 and phospho-MKK6 

covfphospho-MKK3, phospho-MKK61 
p (phospho-MKK3, phospho-MKK6) = , v v i 

^var(phospho-MKK3) * \/var(phospho-MKK6) 

within a fixed, but large, range of all parameter values. 

We therefore ran the LNA in combination with the 
particle swarm optimisation algorithm of COPASI, using 
the correlation coefficient as the objective function for 
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Figure 6 Coefficient of variation of nuclear phospho-p38 vs. 
stimulation strength in the MARK model Coefficient of variation 
of nuclear phospho-p38 vs. concentration of LPS [ng/ml]. 



maximisation. In addition we set constraints on the 
number of steady state particle numbers in the system. 
Both phospho-MKK3 and phospho-MKK6 particle num- 
bers were allowed to change only 4-fold, i.e. within 50% 
- 200% of their original values. All other species' particle 
numbers were allowed to change 100-fold, i.e. within 
10% - 1000% of their original values. The reasons for 
this were, firstly, that we did not primarily want to 
change the steady state of the system but rather only 
wanted to affect the fluctuations around the steady 
state. Secondly, if particle numbers are not constrained 
the optimisation often converges towards degenerate 
cases where one or both of the steady state particle 
numbers are very close to zero, i.e. the lower limit - a 
situation where the LNA estimation can have large 
errors due to its assumption of Gaussian fluctuations. 

We used a particle swarm optimisation [27] method 
with a swarm size of 50. The parameters to vary were 
all 29 reaction rates of the first 20 reactions in the 
model (see [24]), which includes all receptor (complex)- 
related reactions, both MKK3 and MKK6 branches, and 
the phosphorylation and dephosphorylation of p38. The 
parameters were allowed to change 100-fold, e.g. within 
10% - 1000% of their original values. With these settings 
our method was able to find conditions where the esti- 
mated correlation between phospho-MKK3 and phos- 
pho-MKK6 was larger than 0.95 with a computation 
time of roughly 70 min. 

Finally, we were interested in the influence that differ- 
ent choices for parameters in the two branches have on 
the fluctuations of the output of the signalling pathway 
(phospho-Hsp27) or, in other words, how reliable or 
noisy the overall signalling pathway can be. We used a 
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particle swarm optimisation (swarm size = 50) [27] 
in combination with the LNA to minimise the coeffi- 
cient of variation (CV) of phospho-Hsp27 



(CV(phospho-Hsp27) 



yvar(phospho-Hsp27) 



). The 



|phospho-Hsp27| 
parameters that were allowed to change were the 21 
reaction rates of all reactions listed in Table 1. All rates 
were allowed to change 4-fold (e.g. from 50% to 200% of 
their original value). Column "Changes (no constraints)" 
in Table 1 details how the optimisation minimised CV 
(phospho-Hsp27). Most of the rates were increased or 
decreased until they reached the given limits. Briefly, 
one can see that the phosphorylation steps of MKK3, 
MKK6 and p38 are made faster, whereas their respective 
dephosphorylations are made slower. Obviously, the CV 
can be minimised by just increasing the steady state par- 
ticle number and leaving the variance as it is. Because 
this was the result of our first attempt, we carried out a 
second run where we constrained the phospho-Hsp27 
particle number to stay below the limit of 4.65 • 10 par- 
ticles. With the original parameter set the steady-state 
particle number of phospho-Hsp27 was 4.647 • 10*^ par- 
ticles. The result of this second calculation is shown in 
column "Changes (constrained)" of Table 1. The most 
notable differences compared to the unconstrained case 
can be found in the MKK3 branch. Now the phosphory- 
lation of p38 by phospho-MKI<3 (MKK3P) is slower 
than in the original model. Also, the rates for both the 
binding and dissociation of phospho-p38 (p38P) and its 
phosphatase (p38_phosphatase) have been increased as 



well as the rate for the corresponding dephosphoryla- 
tion. The binding of phospho-MKK6 and its phospha- 
tase (MKK6_phosphatase) is now faster while the rate 
for the corresponding dissociation seems to be almost 
unchanged from its original value. 

We would like to note here that a (naive) comprehen- 
sive search for optima using a regular grid approach and 
stochastic simulations of the system in this particular 
case would have taken a prohibitively long computation 
time. Assuming that, within the given limits, we only 
look at ten different values per parameter we would 
have io<"° P"^">'='"^>= 10^1 sample points. For each 
point we would need to carry out a stochastic simula- 
tion that, including the calculation to allow the system 
to settle down to a steady state, takes approximately 
7700 s on a typical desktop computer (for a simulated 
time of 10000 s). Neglecting the time needed to calcu- 
late the actual statistics on the simulated time series this 
would lead to a computation time of more than lO^'^ • 
7700 s « 2.4 • lO'^'^ years. And this would only explore 
ten values of each parameter (i.e. it would be at a low 
resolution.) In contrast our method, using the linear 
noise approximation in combination with numerical 
optimisation, took 25 min to converge. This clearly 
shows the utility of the method we propose here: it 
makes tractable to calculate many phenomena that 
otherwise would be computationally prohibitive. Finally, 
although an approximation had to be adopted, it is typi- 
cally so good that this has very little impact on the 
accuracy of the method. 



Table 1 Optimisation of the coefficient of variation of phospho-Hsp27 particle numbers 


Reactions 


Changes 


Changes 




(no constraints) 


(constraints) 


complex + MKK6 <-> complex_MKK6 


=> 


=> 


complex_MKK6 complex + MKK6P 


=> 


=> 


MKK6_phosphatase + MKK6P ^ Ppase_MKK6P 


<= 




Ppase_MKK6P MKK6_phosphatase + MKK6 


<= 


<= 


complex + MKK3 complex_MKK3 


=> 


=> 


complex_MKK3 complex + MKK3P 


=> 


=> 


MKK3_phosphatase + MKK3P ^ Ppase_MKK3P 


•<= 


<= 


Ppase_MKK3P MKK3_phosphata5e + MKK3 


<= 


<= 


MKK6P + p38 ^ MKK6P_p38 


=> 


=> 


MKK6P_p38 MKK6P + p38P 


=> 


=> 


MKK3P + p38 ^ MKK3P_p38 


=> 


<= 


MKK3P_p38 MKK3P + p38P 


=> 


<= 


p38_phosphatase + p38P Ppase_38P 


<= 


«• 


Ppase_p38P -» p38_phosphatase + p38 


<= 


=> 



Optimisation of the coefficient of variation of phospho-Hsp27 particle numbers with regards to all 21 reaction parameters of the listed reactions {[LPS]o = 1 ng/ 
ml). "Changes (no constraints}" means that the coefficient was optimised without any further constraints, whereas "Changes (constrained)" means that during 
optimisation the phospho-Hsp27 particle number was constrained in the optimisation to stay below the limit of 4.65 million particles. "=>" {"<^") denotes an 
increase (decrease) in the forward rate and a decrease (increase) in the reverse rate, in case of a reversible reaction. means that both forward and reaction 
rates are increased and "~" means that the optimisation led to no clear change 
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Discussion 

Our contribution with this work is two-fold. First, we 
implemented the linear noise approximation in the freely 
available software COPASI, and thus made it accessible 
to a large group of users. Secondly, we showed how the 
LNA in combination with multi-dimensional parameter 
scans or with global numerical optimisation methods is 
appropriate to quickly characterise the influence of para- 
meters on intrinsic fluctuations in biochemical models 
even when there is considerable uncertainty about a 
number of parameters. We showed, with realistic bio- 
chemical signalling models, that using this approach one 
is able to explore parameter space such that conditions 
can be found for which there is minimal, or maximal, 
noise. It is also possible to search for conditions where 
specific model variables are highly (or poorly) correlated. 
This new method thus provides a new and important 
way to explore the universe of behaviours displayed by 
models. Given the importance of noise and fluctuations 
in intracellular biochemistry, this method is therefore of 
great value for the study of those systems. 

In the recent article by Komorowski et al. [33] a related 
method is proposed. There, the linear noise approximation 
is used to calculate Fisher information matrices for sto- 
chastic models, primarily to inform experimental design, 
e.g. by examining the information content of different 
experimental samples. Our approach, on the other hand, 
focuses on exploring the model independently of any phy- 
sical measurements. Therefore, the two approaches are 
complementary. 

In certain cases, however, care should be taken when 
using the LNA. This is due to the assumption that the 
fluctuations are Gaussian in nature. Problems can arise if 
the system is close to a boundary. For example, if the 
number of molecules for a particular species is very close 
to zero the probability distribution for the fluctuations 
becomes 'squashed' (which the LNA does not take into 
account), to satisfy the requirement that the probability 
to have a negative number of molecules present is zero. 
Boundaries can also arise due to conservation relations, 
which are discussed in the Methods section, as these add 
constraints to the system. When using the LNA in com- 
bination with one of the optimisation algorithms in 
COPASI, such systems near boundaries are sometimes 
found, especially when the user wishes to minimise a 
covariance, as we found when studying the p38 MAPK 
model. This is because the fluctuations can be very small 
when the system is close to a boundary, which can give 
the impression that the fluctuations of two different spe- 
cies are uncorrelated, which may not be the case away 
from the boundary. In these cases, adding constraints to 
the particle numbers (as we did when studying the p38 
MAPK model) helps to keep the system away from these 



states. The current implementation of the LNA in 
COPASI is only able to consider models in which the 
reactions all occur within one compartment. As many 
biochemical models involve multiple compartments we 
hope to extend our work, so that in future it will be pos- 
sible to use the LNA to study a wider range of models. 

Methods 

Biochemical network models of the kind we analyse 
here can be described as consisting of x species 
Yi,...,Y^ enclosed in a volume V. There will be M 
reactions which interconvert species: 

riiYi + . . . + r^^Yj^ ^ pu Vi + . . . + p^jY^ 

TimYi + . . . + r^M^i PimYi +■■■+ Pku'^k 

where the numbers r,^ and j?,^ {i = \, . . . ,k; fi = 1, . . . ,M) 
describe respectively the population of the reactants and 
the products involved in the reaction. This may be written 
more compactly as 

k k 

J2 '"'z^'^' ^ 12 P'M^" M = 1' 2, . . . M. (1) 

1=1 1=1 

All the reactions above are strictly irreversible, there- 
fore, without loss of generality, any chemically reversible 
reactions must be described as two separate irreversible 
reactions. The elements of the stoichiometry matrix, 
Vifi = Pifi - r,-^, describe how many particles of species Y; 
are transformed due to the reaction ft. Although there 
are ^ species present in the system, they may not all be 
able to vary independently. This is because mass conser- 
vation relations are often present in the system which 
cause some variables to be linear combinations of 
others. As a simple example of this, consider the 
Michaelis-Menten reaction mechanism in an open sys- 
tem, described in Table 2. 

A substrate, S, is converted to a product, P, via an 
enzyme E. The substrate and enzyme form a complex, 
SE. A constant flux of S molecules is supplied to the 
system and P molecules are able to leave the system. In 
our notation above, Yi = S, Y2 = E, Y3 = P and Y4 = SE. 
Also, rii = 1, = 0 and so on. The total number of 



Table 2 Michaelis-Menten reaction mechanism 



Reactions 


Kinetics 




\/, = k, 


S + E ^ SE 


V2 = k2' [S] . [E] 


SE ^ S + E 


V3 = k,- [SE] 


SE ^ P + E 


^4 = '(4 ■ [SE] 


P 


vs = ks' [P] 
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enzyme molecules, i.e. the number of free enzyme mole- 
cules plus the enzyme molecules bound in a complex, is 
fixed. If the number of SE molecules decreases by one, 
the number of E molecules increases by one. This con- 
servation relation means that there is only one indepen- 
dent variable here, not two. In general, if the system 
contains A conservation relations, then the dimension 
of the system can be reduced from ^ to = ^ _ A- It is 
necessary to reduce the size of the system in this way to 
facilitate the linear algebra to be done later. In the 
Michaelis-Menten system above, ^ = 4 and A = 1, so 
K=3. 

To specify the model, kinetic functions /^(n, V) asso- 
ciated with reaction fi need to be given. They are functions 
of the vector of particle numbers n = and 
volume V. Note that the vector of particle numbers has 
been 'shortened' from length }( to K using the conserva- 
tion relations. This will be discussed in more detail later in 
this section. In the limit where both the particle numbers 
and the volume become large, the kinetic functions 
become functions of the species concentration m,7 V only; 
we then denote them by^(;i;:), where Xi = limv_^oo n,/V! In 
this limit the conventional, macroscopic and deterministic, 
description of the systems applies and a set of ordinary dif- 
ferential equations (ODEs) can be written down to 
describe it: 



dxi 

la 



(2) 



(The ODEs for the species that have been eliminated 
can be found by using the conservation relations.) How- 
ever, the large system size limit is inappropriate for many 
systems of interest, in particular when the molecular 
populations are low (and the volume is small, as in most 
cells), then the discrete nature of the molecules has 
important consequences. In these cases a stochastic 
description is required. 

The starting point for the stochastic description is the 
chemical master equation, which specifies how the prob- 
ability that the system is in the state n at time t, P{n, t), 
changes with time. If T^,{n\n') is the transition rate from 
state n' to state n associated with reaction pi, then the 
master equation takes the form 

= E ["^"("1" - - t] - T^n + v^\n)P{n, t)], (3) 

where = (vi^, Vn^^^) is the stoichiometric vector 
corresponding to reaction ^. This completely defines the 
stochastic dynamics of the system once the initial condi- 
tion P{n, 0) is given. If we multiply Eq. (3) by n, and 
sum over all possible values of n one finds, after shifting 
the change of variable m — > m + v in the first term, 



d(n(0) 
dt 



(4) 



Dividing Eq. (4) by V and taking the limit V ^ °°i we 
see that we recover the deterministic description Eq. (2) 
if we make the identification ^(;*;) = limy_^^V^ {Tf,{n + 
v^|m)). To go further than the macroscopic description 
Eq. (2) we need to develop an approximation scheme 
which goes beyond the deterministic dynamics in a sys- 
tematic way. Fortunately such a scheme exists: the sys- 
tem-size expansion of van Kampen, which allows one to 
calculate corrections to the deterministic results in 
powers of V^^^ by writing njV = x + ^j-y/V, where x is 
found by solving Eq. (2). To next-to-leading order, 
which in general gives results in very good agreement 
with simulations, this is equivalent to assuming that the 
stochastic fluctuations are Gaussian, and so determined 
by stochastic processes which are linear. For this reason, 
this is frequently known as the linear noise approxima- 
tion (LNA). Details of the general application of the 
method are given in the book by van Kampen [14], and 
for chemical reactions of the kind we are considering 
here by Elf and Ehrenberg [9]. In [14,34], terms an 
order smaller than the LNA are included. In most cases 
the LNA is very accurate, and these extra terms are not 
significant, provided the steady-state solution is not near 
a boundary, in which case the fluctuations would no 
longer be Gaussian. One finds that the stochastic 
dynamics of the LNA is governed by the Fokker-Planck 
equation 



a^n 



9n 9 1 ^ 

— = - > — (H- (I) n) + - > B,, , 



(5) 



1=1 



Where n(|, t) = P{n, t) and M,(^) = X- j ^ij^r There- 
fore the entire dynamics is defined by two matrices A 
and B which are given by 



B,j{x) = ^Vi^VjiJ^{x). (6) 



In all the investigations we will carry out in this paper, 
we will be interested in fluctuations about the stationary 
state. In terms of the deterministic dynamics Eq. (2), the 
solution x{t) will be replaced by its fixed point value 
and so the A and B matrices will be independent of 
time. 

The Fokker-Planck Eq. (5) is linear, and so therefore 
its solution, n(^, t), is Gaussian, and may be charac- 
terised by its first two moments. The ansatz used to set 
up the system-size expansion implies that the first 
moment, {f;(f)), is zero to this order. In the stationary 
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State the covariance, will only depend on \t - 

t'\. Therefore, Eij = {ii{t)^j{t)) will be independent of 
time, and will satisfy [14] 



A3+ SA^ + B = 0. 



(7) 



All the matrices in Eq. (7) are dimension K x K and 
are independent of time. In applications we are fre- 
quently interested in the covariance in terms of particle 
numbers 



Cij = {{rii - (n,>) {rij - («;>)). 



(8) 



Since («,) = VXi and since the average of the fluctua- 
tions is zero to the order we are working. 



(9) 



The Lyapunov equation, analogous to Eq. (7) is there- 
fore 



AC+CA' +VB = 0. 



(10) 



The equation can be solved for C numerically by 
employing the Bartels-Stewart algorithm [35]. Here, A is 
transformed to lower real Schur form and is trans- 
formed to real Schur form. This allows elements of the 
transformed matrix C to be solved for successively. The 
solution of C is found by reversing the original transfor- 
mation. It is important that the conservation relations 
are used to reduce the dimension of the system before 
the Bartels-Stewart algorithm is applied. The matrices 
for the 'unreduced' system will contain linearly depen- 
dent columns or, equivalently, zero eigenvalues. When 
this is the case, the solution to the equation is no longer 
unique [35]. 

Therefore our implementation of the LNA first auto- 
matically determines existing conservation relations 
(also known as conserved moieties) and reduces the sys- 
tem from {( to K independent chemical species. Then 
the LNA is applied to the reduced system and the corre- 
sponding covariance matrix is calculated. In the last step 
the covariance matrix for the full system is recovered as 
follows. 

For convenience, the state vector n should be written 
with the K = k — A independent species first i.e. filling 
the first K positions, and then the dependent species 
should be written at the end. This anticipates shortening 
n (when the size of the system is reduced) to contain K 
elements, rather than The dependent species may be 
written in terms of the independent species by using the 
conservation equations which are linear combinations 
and so, in general, are of the form: 



K 

Cj + ^ ajktik, j 



K+1,...,K, 



(11) 



where the Cj and a,^ are constants. 

Examining the conservation relations after the change 
of variables used in the van Kampen expansion 
[rij = VXj + VV^j) is introduced, the above equation 
becomes: 



Vxj + Vv^j =Cj + J2 M^^k + Vv^k). 



(12) 



But the conservation equations should hold in the 
deterministic limit {V ^ c»), too, i.e. 



VXj = Cj + ^2 ^jkVXk. 



(13) 



Therefore, 

K 



(14) 



fe=i 



We can now use the above results to compute the 
remaining covariances. First of all we calculate H/y, 
where / is an independent species and / is a dependent 
species: 



(15) 



fe=i 



since (^i) = {^/) = 0. Now we have an expression for Hy 
in terms of known quantities, the covariances of the 
independent species, which are found from solving the 
Lyapunov equation. Now we calculate H,y for the case 
where i and / are both dependent species. 



(16) 



1=1 



Again, we have obtained an expression in terms of 
known quantities. As before, Cij = VH,y. Using a so- 
called link matrix L that connects the reduced and the 
full systems, as defined by Reder [36] 



/I 0\ 




/ 


I \ 










0 1 








V ^-0 ) 









(17) 



we can write this more concisely: 



C = LC^'^L^, 



(18) 



with C'"^'^' the K x K covariance matrix of the reduced 
system. 

We will illustrate the procedure by examining the 
Michaelis-Menten reaction mechanism, described earlier 
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in Table 2. The macroscopic model of the system, writ- 
ten as a set of ODEs, is as follows: 

dxi 
"dT 

dX2 

"dT 
"dT 

dX4 



Table 3 Covariance matrix C for the Michaelis-Menten 
reaction system. 







s 


SE 


P 


E 


= fel — 1^2X1X4 + h^X2, 


s 


1455.82 (1455.57) 


61.35 (61.23) 


33.46 (32.80) 


-51.35 (-51.23) 




SE 


61.35 (61.23) 


59.09 (59.07) 


-4.36 (-4.43) 


-59.09 (-59.07) 


= k2XiX4 — {ks + k4)X2, 


p 


33.46 (32.80) 


-4.36 (-4.43) 


773.85 (77341) 


4.36 (443) 




(19) E 


-61.35 (-61.23) 


-59.09 (-59.07) 


4.35 (443) 


59.09 (59.07) 



k4X2 — k^Xs, 

{ks + k4)X2 — k2XiX4, 



where Xi is the concentration of species S, X2 is the 
concentration of SE, X3 is the concentration of P and X4 
is the concentration of E. The system contains one con- 
servation relation, as the total number of enzyme mole- 
cules (whether they are free, or bound in the 
intermediate complex) is constant. We will write this as 
X2 + X4 = P, where is a constant. Therefore, we can 
eliminate X4 from the ODEs, and re-write them in a 
simpler form, 



dxi 

"df " ^^^'^^^ 

dxs . 

—— = k4X2 — 

dt 



k2Xi{P — X2) + ksX2, 
— X2) — {ks + k4)X2, 

ksXs. 



(20) 



The steady state is calculated by setting the time deri- 
vatives to zero and solving the resulting equations 
simultaneously. The steady state values for the concen- 
trations are shown below: 



kiks + k\k4 



— X* = — 

fe4 ' ^ fes ' 



(21) 



From Eq. (6), A and B are found to be: 

(-k2{P - x*2) k2X\ +k3 0 ' 

k2iP - xl) -k2x\ - [ki + k4) 0 
0 fe4 -fes , 



[fel + k2X\{fi — X2) + ^-iX^ —'k2X\{P — ■'^2) ~ ^-iX^ 0 
—k2X\{fi — X*2) — }Z^X\ k2X\{fi — X*2) + {kj, + ki^x\ —^4X2 
0 —^4X2 ksX\ + ki^x\ 



(22) 



(23) 



Once values of the reaction parameters have been cho- 
sen, numerical values of A and B may be found. The cov- 
ariance matrix C can then be solved by using the Bartels- 
Stewart algorithm. Table 3 shows the entries of the cov- 
ariance matrix calculated using the LNA in COPASI, and 
compares them with values obtained from simulation. 

As just mentioned, we implemented the LNA 
described above in the software COPASI [21,22]. 
COPASI is a widely used software for the analysis and 
simulation of biochemical networks. It lets the users 



The convariance matrix C for the Michaelis-Menten reaction system. Reaction 
parameters were chosen to be = 0.2 nMs"\ /C2 = 4 nM"^s"\ k^ = i s"\ = 1 
s"\ ks = 0.15 s"V The system volume was 10"^^ I. The covariances calculated 
using the LNA was compared with those obtained from simulation (values in 
brackets) using lO'^ time series generated using the Gillespie algorithm (Direct 
Method) in COPASI 



access sophisticated mathematical methods, such as 
deterministic, stochastic and hybrid simulation, meta- 
bolic control analysis, sensitivity analysis, optimisation 
and parameter fitting, to study their models. COPASI 
also allows closed-loop applications of parameter scan- 
ning, sampling and optimisation with one of the other 
analyses, for example sensitivity analysis or the linear 
noise approximation. Models can be conveniently 
imported and exported using the Systems Biology 
Markup Language (SBML) [25]. COPASI is an open 
source software and is freely available under the Artistic 
license version 2.0 at [22]. 

Briefly, our LNA implementation in COPASI first 
detects dependent species (conservation relations) and 
carries out the corresponding reduction of the system, if 
needed. Then an automatic search for a steady state of 
the system is started. If a steady state has been found 
the Lyapunov equation Eq. (10) for the reduced system 
is solved at this steady state using the Bartels-Stewart 
algorithm. Finally, the covariance matrix for the full sys- 
tem is recovered as described above. 

In addition, before the LNA is carried out COPASI 
automatically checks the model according to a number 
of criteria that preclude a direct calculation of the LNA. 
For instance, if there are reversible reactions present in 
the model COPASI will notify the user that all reversible 
reactions have to be split into irreversible reactions 
before the LNA can be applied. There exists a tool in 
COPASI which can do this in an automated way for a 
large class of models. 

Optimisation is a general modelling tool with a wide 
application to the solution of diverse problems. Essen- 
tially, if something can be specified as a maximum or 
minimum of some function, optimisation will be the way 
to solve such a problem. In biochemical network model- 
ling the most common application is parameter estima- 
tion; another one is the design of genetically engineered 
pathways (commonly known as metabolic engineering) 
where one seeks to maximise a flux, titre or a yield of a 
biotransformation [18]. Other popular applications are 
those where a specific parameter set is sought that 
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produces a desired behaviour of the model. This is the 
basis of a method that tabulates all maxima and minima 
of each parameter sensitivity towards a specific model 
variable [20], as a means of approaching global sensitivity 
analysis. All of these applications require that a simulator 
be integrated with an optimisation algorithm in a closed 
loop. 

There are many different numeric algorithms for search- 
ing minima (or maxima) of functions: the traditional gra- 
dient-based methods, direct search that use geometric 
heuristics, population-based algorithms like evolutionary 
algorithms and particle swarm [27,28], and stochastic 
searches like simulated annealing. Often problems are 
complex in that the objective function is not convex and 
can have several local minima yet one seeks one of the 
global minima. For such problems it is necessary to 
employ algorithms that are not trapped in local minima, 
as are the gradient-based algorithms. Empirical evidence 
shows that population-based and stochastic search algo- 
rithms are commonly the most efficient at finding global 
minima. In our experience with biochemical networks this 
is usually achieved by evolutionary algorithms [19] or the 
particle swarm algorithm. All of these algorithms are avail- 
able in our COPASI implementation. 

Additional material 



Additional file 1: Model of p38 MARK signalling The model of p38 
MARK signalling [24] used. Diagram created with CellDesigner version 4.1 
[37,38]. 
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